Picture for Xinyuan Qian

Xinyuan Qian

Beyond Lips: Integrating Gesture and Lip Cues for Robust Audio-visual Speaker Extraction

Add code
Jan 27, 2026
Viaarxiv icon

Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification

Add code
Jan 26, 2026
Viaarxiv icon

PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

Add code
Jan 07, 2026
Viaarxiv icon

Region-Specific Audio Tagging for Spatial Sound

Add code
Sep 11, 2025
Viaarxiv icon

VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching

Add code
Jul 10, 2025
Viaarxiv icon

Exploring Length Generalization For Transformer-based Speech Enhancement

Add code
Jun 07, 2025
Figure 1 for Exploring Length Generalization For Transformer-based Speech Enhancement
Figure 2 for Exploring Length Generalization For Transformer-based Speech Enhancement
Figure 3 for Exploring Length Generalization For Transformer-based Speech Enhancement
Figure 4 for Exploring Length Generalization For Transformer-based Speech Enhancement
Viaarxiv icon

FIGhost: Fluorescent Ink-based Stealthy and Flexible Backdoor Attacks on Physical Traffic Sign Recognition

Add code
May 17, 2025
Viaarxiv icon

Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture

Add code
Apr 21, 2025
Figure 1 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Figure 2 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Figure 3 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Figure 4 for Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Viaarxiv icon

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles

Add code
Jan 02, 2025
Figure 1 for FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Figure 2 for FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Figure 3 for FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Figure 4 for FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Viaarxiv icon

Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition

Add code
Jan 01, 2025
Figure 1 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 2 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 3 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 4 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Viaarxiv icon